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For a random matrix following a Wishart distribution, we de- 
rive formulas for the expectation and the covariance matrix of com- 
l/") ■ pound matrices. The compound matrix of order m is populated by 

all m x m-minors of the Wishart matrix. Our results yield first and 
second moments of the minors of the sample covariance matrix for 
multivariate normal observations. This work is motivated by the fact 
that such minors arise in the expression of constraints on the covari- 
Ci ■ ance matrix in many classical multivariate problems. 

Ctf ■ 

Q , 1. Introduction. Conditional independence constitutes one of the key 

concepts in multivariate statistical modeling. In a multivariate normal ran- 

CSJ ■ dom vector X = (X\, . . . ,X r ) T ~ Af r (n, £), conditional independence ex- 

^ [ presses itself in the vanishing of minors, that is, subdeterminants of the 

positive definite covariance matrix. Let I,J,K C [r] := {l,...,r} be three 
pairwise disjoint index sets. Then Xj and Xj are conditionally independent 

given Xk, in symbols Xt _U_ Xj \ Xk, if and only if 
O . 

(1.1) det(£ {i} utfx{j}u*) = VielJeJ. 

r~| ■ The restrictions (1.1) correspond to vanishing partial correlations and can 

thus be tested using sample partial correlations, which yields a simple ap- 
proach to model selection and assessment of goodness of fit of Gaussian 
independence models. 

.£h ! The situation becomes more complicated, however, in hidden variable 

/^ • models because conditional independences involving hidden variables may 

lead to constraints on the covariance matrix of the observed variables that no 
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2 M. DRTON, H. MASSAM AND I. OLKIN 

longer correspond to vanishing partial correlations. Spearman's [14] tetrads 
are the classic example of such constraints. A tetrad is a 2 x 2-minor det(Yiij X ke) 
for which {i,j} fl {k,£} = 0. Tetrads are the defining equality constraints 
for one-factor analysis [6], but also arise in other Gaussian hidden variable 
models. (Recall that in factor analysis, observed variables are conditionally 
independent given hidden factors.) 

Given a sample from a A/" r (/U,S) distribution (joint) vanishing of tetrads 
can be tested. Rejection of this hypothesis suggests that the model for which 
the tetrads would vanish is inappropriate for the data. The route commonly 
taken when testing the vanishing of a tetrad is to standardize the sample 
tetrad and compare the result to the standard normal distribution. This 
approach allows one in particular to avoid numerical maximization of the 
complicated likelihood functions of hidden variable models and we refer the 
reader to the examples discussed, for example, in [3, 8, 15]. The difficulty in 
this procedure is how to standardize the sample tetrad, a problem solved by 
Wishart [16] who found the sampling variance of the tetrad. 

However, Wishart 's result only applies to 2 x 2-minors, which has limited 
the application of the above constraint-based inference approach. In this 
paper we greatly generalize Wishart's result to obtain the covariance matrix 
of higher-order minors of a Wishart matrix, a problem that is also of intrinsic 
distribution-theoretic interest. In Section 2, we clarify the role of higher- 
order minors in hidden variable models. In Section 3, we present some basic 
results based on simple but powerful invariance arguments for compound 
matrices. Together with the properties of the Choleski decomposition of a 
Wishart matrix, these results allow us to compute, in Sections 4 and 5, the 
expectations and covariance matrix of minors of arbitrary Wishart matrices. 
In our conclusion in Section 6, we comment on future research directions 
and give an example of constraint-based inference based on 3 x 3-minors. 
The Appendix contains the proofs of two lemmas as well as an interesting 
auxiliary result on the mean of the determinant of a noncentral Wishart 
covariance matrix. 

2. Off-diagonal minors and hidden variables. Tetrads are 2 x 2-minors 
that do not involve any diagonal elements of the covariance matrix X. We 
call any minor with this property an off-diagonal minor. In seminal work, 
Spirtes, Glymour and Schemes [15], Theorem 6.10, have characterized the 
tetrad relations in covariance matrices from directed Gaussian graphical 
models. The characterization of the vanishing of higher-order off-diagonal 
minors is still an open problem but in Proposition 2.2 below we are able to 
give simple sufficient conditions. Proposition 2.2(h) applies in particular to 
factor analysis with m — \ factors; see also [6]. 

Consider a random vector X = (X\,. . . ,X r ) ~ A/" r (/i,X) with r > 2m 
components. Let /, J C [r] be two disjoint index sets of cardinality |/| = 
\J\ =m> 1. 
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Lemma 2.1. If K C [r] \ (I U J), then X 7 JL Xj | X x if and only if 
det(E G -xH) = for all GC. Ill K and H CJuK of cardinality \G\ = \H\ = 

\K\ + l. 

Proof. The claimed vanishing of minors implies (1.1) and thus the 
conditional independence. Conversely, the conditional independence implies 
that 

The second equality implies rank(Ejui<rxJURr) <■ l-^l an d thus the claim. □ 

Proposition 2.2. (i) If Xi _LL Xj /or some i e I, i/ten det(S/ x j) = 0. 

(ii) Suppose the partitions I = I\ UI2 ami J = Ji U J2 /iaue Ii, Ji / (T2 
or J2 may 6e empty). Let K\ and K be subsets of [r] \ (/ U J) swc/i t/iat 
Id C K and \K\ + \h\ + | J 2 | < |#i| + m - 1. 1/ X 7l JL Xj, | X Xu/2U j 2 and 
X/ JL J^, then det(E/ x j) = 0. The proposition states in particular that if 
Xi JL X.j I X K for K C [r] \ (I U J) iwtft |K| < m - 1, tfien det(£/ x j) = 0. 

Proof, (i) Immediate, (ii) By Lemma 2.1, rank(Sjuis'xJuft') < l-^l + 
I -^2 1 + l^l and thus det(S/ u x 1 xJui<'i) = 0. Since E/xKi = 0, it holds that 
det(T l [\ J K 1 xJui<i) = det(S/ x j) det(S^ lX Xi), and the claim follows because 
det(T l K 1 xK 1 ) > 0. The last statement of the proposition is obtained from (ii) 
by taking I2 = J% = K\ = 0. □ 

For an example in which an m x m-minor yields the only equality con- 
straint on the covariance matrix, consider 2m + (m — 1) random variables 
Xi, . . . , X2 m , Y\, . . . , 7 m _i. Define an acyclic digraph (DAG) G m with these 
random variables as vertices and edges as follows. Every variable Y is ad- 
jacent to every one of the variables Xj by a directed edge Yi — > Xj. Every 
pair of vertices in {Xi, . . . ,X m } is joined by an edge, and the same holds 
for every pair of vertices in {X m+ i, . . . ,X2 m }. For uniqueness assume that 
Xj — > Xj implies i < j. Figure 1 shows the graph G3, which we will take up 
in a data example in the conclusion in Section 6. 

In the remainder of this section, let I = [m] and J = {m + 1, . . . , 2m}. The 
graph G m encodes that Xj is conditionally independent of Xj given Yr m _i] 
and that the random variables Yj are completely independent; see, for exam- 
ple, [10] for details on the stochastic interpretation of directed graphs. Treat- 
ing Y\, . . . , Y m -\ as hidden yields a Gaussian model for (Xi, . . . , X2 m ) T . It 
can be shown that this model contains exactly those distributions A/2 m (//, E) 
that have a covariance matrix of the form £ = Q + A A , where A is an arbi- 
trary 2m x (771 — l)-matrix and O is a positive definite block-diagonal matrix; 
O/xj = 0. Let C m be the set of covariance matrices £ in this model. The 
following lemma is proven in the Appendix. 
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Lemma 2.3. If a polynomial f in the entries of the covariance matrix 
£ evaluates to zero at every matrix in C m , then f is a polynomial multiple 
of the off-diagonal m x m-minor det(£/ x j)- 

3. Invariance under orthogonal transformations. Given that minors of 
covariance matrices arise so naturally in independence models, it is inter- 
esting to study their natural estimators, namely the minors of the sample 
covariance matrix. Up to a scaling factor depending on the sample size, such 
sample minors are distributed like the minors of Wishart matrices, which 
arise as follows. 

Let X £ lR rxn be a matrix whose columns are independent random vectors 
distributed according to the multivariate normal distribution A/" r (0, X) with 
positive definite covariance matrix £ £ M rxr . Then S = XX T is distributed 
according to the Wishart distribution with scale parameter matrix £ and 
n degrees of freedom, in symbols, S ~ W r (n,T<). We refer to the Wishart 
distribution W r (n,I r ) with the identity matrix I r £ R rxr as scale parameter, 
as standard Wishart distribution. 

Simple invariance arguments based on ideas from Olkin and Rubin [13] 
(see also [4] and [7], Problem 4, page 330) will permit us to learn much about 
the standard case. 

Definition 3.1. Let 0(r) be the group of orthogonal matrices in W xr . 
The distribution of a symmetric random matrix V £ R rxr is orthogonally 
invariant, if for all G £ 0(r), the distribution of GVG is identical to the 
distribution of V . We will say, for brevity, that V £ W xr is orthogonally 
invariant. 

For S - W r (m,E) and G £ 0(r), we have that GSG T ~ W r (?i,GSG T ), 
and hence, the standard Wishart distribution W r (n,I r ) is orthogonally in- 
variant. 

The objects of our study are minors det(Wj x j) or det(Sj- x j) that are 
specified by two subsets /, J C [r] of equal cardinality |/| = \J\ = m. We 




Fig. 1. The graph G3 with two complete subgraphs joined through two hidden variables. 
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introduce the notation 



|^j = {JC [r]:\I\=m}, m € [r]. 



Proposition 3.2. Let I, J G { r } . 7/ i/ie symmetric random matrix V G 
M rxr is orthogonally invariant, then 

E[det(y /xJ )l = ( E [det(F Wx[m] )], i// = J, 
[ 0, otherwise. 

Proof. We extend the proof of [13], Lemma 1, which treats the case 
771 = 1, in which the minors reduce to individual entries of V . Let I, J G {^} 
be two distinct subsets. For j G J\I, let Dj G 0(r) be the diagonal matrix 
equal to the identity matrix except for entry (j,j) which is equal to —1. 
Then DjVDj differs from V in that all off-diagonal entries of the jth row 
and column have been negated. Since j G J but j ^ I, det[(DjVDj)j x j] = 
-det(V IxJ ). Thus, E[det(Vi x j)} = is implied by E[det(V/ xJ )] = 
Efdetp^Dj ) /xJ ] = -E[det(V IxJ )]. 

Since |/| = \J\, we can find a permutation that maps the indices in I to 
those in J. Let P = Pjj G 0(r) be the matrix representing this permuta- 
tion. Then, (PVP T ) IxI = V JxJ , and E[det(Vr x /)] = E{det[(PyP T )/ x /]} = 
E[det(Vj x j)]. It follows that E[det(F/ x7 )] =E[det(V[ m]x[m] )] for all / G {^}. 
D 

Our approach to determine the moment structure of the minors of a 
Wishart matrix is based on the following ideas. First, recall that if S ~ 
W r (n,S), and W~yV r (n,I r ), then 

(3.1) S = Y}I 2 WY}/ 2 ~ W r {n, S 1 / 2 /^ 1 / 2 ) = W r (n, E). 

[For notational simplicity, we will use the symmetric square root throughout 
the paper but nonsymmetric square roots (e.g., lower triangular) could be 
used instead.] Second, recall that for a matrix A G M. rxr and an integer 
m G [r] , the mth compound of A is the matrix 

A^ = (det(A IxJ )) I j rr, G M(m) x (m) 

' ImJ 

that is populated with all m x ?n-minors of A.Iim = 0, we set A^ ' = 1 G M. 
The Binet-Cauchy theorem (see, e.g., Marshall and Olkin [11], page 503 and 
Aitken [1] , Chapter V) states that 

(3.2) {AB) {n ^ = A^ B {m \ 
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which allows us to use (3.1) for the transfer from standard to general Wishart 
matrices. The last ingredient to our approach is the fact that the products 

det(S /xJ )det(S^ xL ), I,J,K,Lel^\, 

which are exactly the quantities of interest for studying the variance-covariance 
structure of minors of S, are the entries of the Kronecker product S^ m ' ® 

The next proposition states that the first and second moments of the 
compound matrix S^ m ' can be obtained from those of the compound matrix 
W^ m ' for a standard Wishart matrix. The result follows from (3.1) and (3.2). 



Proposition 3.3. Let S ~ W r (n,X!) and W ~W r (n,I r ) and let S 1 / 2 
denote the unique symmetric and positive definite square root o/S. Then 

E[S {m) ] = (E 1/2 ) (m) E[W (m) ](£ 1/2 ) (m) , 

E[S {m *> ® S^\ = [(E 1 / 2 )^) (g) (S 1 / 2 )^)] 

x {E[W {m) <g> W im) ])[(^ 1/2 ) im) <8 (S 1 / 2 )^)], 

Cov[5 (m) ] = E[S (m) <g> S (m) ] - (E[S (m) ] <g> E[S (m) ]) 

= [( S i/2j(m) s (sV2)('»)](Cov[W' (Tn) ])[(S 1/2 ) (m) <8> (£ 1/2 ) (m) ]. 

Proposition 3.3 is only useful if we are able to compute the necessary 
moments of W^ m ' . However, the invariance of W under the orthogonal group 
tells us a great deal about these moments. The full first and second moment 
structure of W^ m > will be derived in Corollary 4.2 and Theorem 4.5. 

In the next result I A J denotes the symmetric difference ( J \ I) U (J\ J). 

Proposition 3.4. Let L,J,K,L e {^}, and let V 6 W xr be orthogo- 
nally invariant. If I A J ^ K AI, then 

(3.3) E[det(Vrx j) det(V KxL )] = 0. 
Moreover, under any permutation of a the indices in [r] , 

(3.4) E[det(V /xJ )det(y X xL)] =E[det(V; (/)X(T(J) )det(y CTW><CT{L) )]. 

Proof. Again we extend the ideas in [13], Lemma 1. 

Let I A J ^ K A L. Assume without loss of generality that there exists 
an index j £ (I A J) \ (K A L). Let Dj be the diagonal matrix defined in the 
proof of Proposition 3.2. Recall that the action of Dj negates the jth row 
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and column in V . By choice of j £ / A J, it holds that det[(DjVDj)j x j] = 
— det(Vi x j). Since either j £ K n L or j $_ K U L, it holds further that 
det[(-DjKDj)i<rxL] = det(Vft- X i). Then (3.3), as well as (3.4), follows imme- 
diately from the orthogonal invariance of V. □ 

Example 3.5. Let m = 2 and r = 4. Applying the permutation a = 
(1)(23)(4), (3.4) implies that E[det(Vi 2x i2) 2 ] = E[det(Vi 3x i 3 ) 2 ] but because 
there is no permutation of {1,2,3,4} that sends {1,2} to {1,2} and {1,2} 
to {2,4}, E[det(Vi2xi2) 2 ] and E[det(yi2 X i2)det(V24x24)] need not be equal. 
We will illustrate the use of Proposition 3.4 further in Example 4.6 where 
we consider a standard Wishart distribution. 

4. Choleski-decomposition of a standard Wishart matrix. The arguments 
presented in Section 3 determine the first and second moments of minors of 
orthogonally invariant random matrices only up to constants. In this sec- 
tion we determine these constants for the standard Wishart distribution 
W r (n,I r ). For this task we use the Choleski-decomposition that has the fol- 
lowing convenient distributional property; see, for example, Muirhead [12], 
Theorem 3.2.14, for a proof of this classical result. 

Lemma 4.1. Let W follow the W r (n,I r ) distribution with n>r. Let 
T = (tij)i<ij< r be lower-triangular with positive diagonal entries such that 
W = TT T . Then the tij, i>j, are independent random variables distributed 
as 4~Xn_i+i> i = 1 >---> r > and %~A/"(0,1),1 <j<i<r. 

We remark that the elements tij have been called rectangular coordinates. 
Since det(TT T ) = l\ r i=1 1\ and E[i|] = Hxl-i+i] = n - i + 1, we obtain the 
following corollary to Proposition 3.2 and to Lemma 4.1 applied to Wi x i- 



Corollary 4.2. Let I,Je {^} . // W ~ W r {n, I r ) with n>m, then 
E[det(W IxJ )} 



n\/(n — m)l, if L = J , 
0, otherwise, 



and 



(n — my. 

The Choleski-decomposition W = TT T of a standard Wishart matrix W 
reveals additional information. In the remainder of this section assume that 
n>r, which implies that T is of full rank with probability 1. 
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Lemma 4.3. Let c£ {0, ...,m} be an integer such that there exists a 
subset J C {m + 1, . . . , r} of cardinality \J\= m — c. Then 

det ( W [m]x([c]UJ)) = ( II*« ) I II t iJ det ( T Jx{ C +l,...,m})- 
\i=l / \j=c+l ) 

Proof. Let I = {c + 1, . . . , m} = [m] \ [c] . From the partitioning 

w _ _/%w WW 

^Mx(WuJ) - ^ Jx[c] ^- xJ 

we obtain that 

(4.1) det(JF [m]x([c]uJ) ) = det(W [c]x[c] )det(W IxJ - W Ix[c] W^ x[c] W [c]xJ ). 

Clearly, det(W / [ c ] x [ c j) = Yli=i ^1 so that we are left with studying the second 
factor on the right-hand side of (4.1). 

For a subset DC [r], let Tjj = T Dx i r i be the submatrix comprising all rows 
of T with index in D. Then we can write 

det(W IxJ - W Ix[c] W^ x[c] W [c]xJ ) = det{T f [I r - T^T^T^T^Tj}. 

The matrix I r — TTAT\ C ^TE{)~ 1 T\ C \ represents the orthogonal projection on 
the kernel of Ty. Since T is lower diagonal with by assumption nonzero 
diagonal entries, it holds that ker(Tj c i) = {0} c x M r_c , which means that the 
projection considered replaces the first c entries of a vector in W by zeros. 
Therefore, 

det{2>[J r - T^T^y'T^Tj} = det(T Ix{c+1 _ r} Tj x{c+Kr} ). 
By the Binet-Cauchy theorem, 

det ( r /x{ c +i,...,r} T Jx{ c +i,...,r})= Yl det(r /xD ) det (Tj xD ) 

DC{c+l,...,r}, 

\D\=m—c 

(4.2) 

= det(X> x/ )det(T Jx/ ). 

The second equality in (4.2) holds because if D ^ I = {c + 1, . . . , m}, then 
the matrix Tj xD contains a column consisting entirely of zeros and thus 
det(Tj xD ) = 0. Our claim follows from det (Tj x j) = rijle+i tjj- ^ 

From Lemma 4.3 we can deduce the distribution of a minor det (Wj x j)- 

Theorem 4.4. Let I,J& { T } have a (possibly empty) intersection of 
cardinality \I n J\ = c > 0. IfW~ W r (n, I r ), then 

(4.3) det(W IxJ ) ~ ( f[ Wi j ( fl Vwi J det(Z), 

\i=l / Vi=c+1 / 
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where Wi,i = 1,.. . ,m, are independent Xn-i+i random variables and Z = 
(Zij) £ i( m - c ) x ( m - c ) i s a random matrix of independent jV(0, 1) random 
variables that are also independent of (W±, . . . , W m ). In particular, 



and 



E[det(W IxJ f 
Var[det(W /xJ )] 



ni 



(n + 2)! 



(n-m)\ (n+2-\In J\)\ 



(m-|/n J\)\ 



(n — m)\ 
n\ 



(n + 2)! 



■/?! 



(n-m)\(n + 2-\InJ\y. 



ifI = J, 



(n + 2 — m)! (n — m)\_ 
( n _i_ 2V 

(m-\lnJ\)\, if\InJ\<m. 



Proof. By orthogonal invariance of the standard Wishart distribution, 
we can permute rows and columns of W such that / = [m] and J = [c] U 
{m + 1, . . . , 2m — c}. Thus (4.3) follows from Lemmas 4.3 and 4.1. 

For the derivation of the second moment, recall that E[x^] = n and E[(x^) 2 ] 
n(n + 2). Let Sh be the group of permutations of [h]. Then 



E[det(Zf 






n z ^co z 



ir(i) 



i=l 



^ l[E[Zl {{) ] = (m-c) 



a-eSr, 



%=x 



which yields E[det(Wj x j) 2 ]- The variance is obtained using Corollary 4.2. 

□ 

We next turn to moments of the form E[det(M / /xj)det(W / ft:xL)] with 
(J, J) ^ (K, L). By Proposition 3.4, this expectation is nonzero only if / A 
J = K A L. At this point, before proceeding to derive the desired expecta- 
tion, we would like to emphasize that throughout the paper we consider an 
index set I = {i\, .. . ,i m } to be equipped with an ordering. Such an order- 
ing yields an index sequence (i±, . . . , i m ) that dictates the order in which we 
list the rows (or columns) of a submatrix. Since our results so far did not 
depend on the choice of ordering, we kept this view implicit. For our next 
result, however, the order in which the indices in I are listed matters since 
different orderings may lead to different signs of determinants due to the 
interchanging of the rows or columns in submatrices. For example, 



(4.4) E[det(Wi 2x i 4 )det(W ; 



23x34 J 



-E[det(W"i2xi4)det(W23x43)]- 



10 
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In the following theorem, the elements of four index sets /, J , K, L are 
assumed to be ordered according to a total order of [r] that achieves certain 
order relationships across the four sets. We write A < B if all elements of 
A C [r] are smaller than those of B C [r] , or if A or B is the empty set. (Note 
that this implies < A < 0.) 

Theorem 4.5. Let I, J,K,L£ {^} such that I A J = K A L. Let 

J = /\(/nJ), K = K\(KnL), J=j\(/nJ), L = L\(KnL). 

Moreover, assume that the indices in L , J, K and L are listed according to 
a total order in [r] under which 

(/ n J) \ (k n L) < I < J < (K ni)\(/n J), 
7n/?</nL, Jc\K <Jc\L. 

Under these conventions it holds that if W ~ W r (n, I r ) , then 

t?I (n ■+- 2 V 

E[det(^ /xJ )det(^xL)]- 



(n-m)! (n + 2-|/n JnKnL|)! 



(n-m + |(/nJ)\(A"nL)|) 



(n 



UnirillinLI!. 



771 



Theorem 4.5 yields, for example, that E[det(Wi2xi3) det(W24x34)] = 
n(n — l) 2 . However, it does not yield directly the value of E[det(Wi2xi4) x 
det(W23x34)] in (4.4). Instead, we can obtain that E[det(Wi2xi4)det(W / 23x43)] 
n(n- 1) 2 . Hence, by (4.4), we find E[det(W 12 xl4) det(H 7 2 3x34)] = -n(n- 1) 2 . 



Example 4.6. If m = 2 and r = 4, then the covariance matrix Cov[M / ( m '] , 
which determines Cov{S( m >], is a symmetric matrix of size 36 x 36. Because 
Cov[W^ m )] is derived from the symmetric matrix W, we can restrict our- 
selves to unordered pairs of sets (I, J) G { *" } x {,^} with possible equality 
I = J. There are 21 such unordered pairs. We represent CovfVF'™- 1 ] as a 
symmetric block-diagonal 21 x 21 matrix with blocks formed according to 
L A J. Proposition 3.4 implies such block-diagonal structure also for the 
general case of arbitrary m and r. 

The first block is indexed by the six pairs (1,1), I € { 1 m }, involves the 
principal minors and takes on the form 

12,12 

/ h 



3,13 


14,14 


23,23 


24,24 


34,34 


h 


h 


h 


u 


\ 


h 


h 


h 





h 




h 





h 


h 






h 


h 

h 


h 
h 

h ) 
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where from Theorems 4.4 and 4.5 respectively, we have /i = 2n(2n+l)(n — 1) 
and fi = 2n(n — l) 2 . Next, we have a series of six blocks of size 2x2, each 
involving two pairs (I, J) and (K, L) for which I AJ = K AL and |/n J\ = 1, 
or equivalently, \I A J\ = 2. Two representatives of these six blocks are 



2,13 24,34 




12,14 23,34 


h fa \ 
h ) 


and 


( h ~h 
V h 



where by Theorems 4.4 and 4.5 respectively, J2 = n(n + 2)(n — 1) and f§ = 
n(n — l) 2 . The last block is obtained for the pairs (/, J) with /, J disjoint, 
or equivalently, I A J = [r] = {1, 2, 3, 4}. It takes the form 

12,34 13,24 14,23 

h fa 
fa 

with /3 = 2n(n — 1) and /6 = n(n — 1). 

The remainder of this section is devoted to the proof of Theorem 4.5 in 
which we can assume that r = max(I U JU KL)L). Note that since | ( I D J) \ 
(K D L)\ = \(K D L) \ (I f] J)\, the formula in Theorem 4.5 is not changed if 
the order of (I, J) and (K, L) is reversed. 

Lemma 4.7. IfInJr)KnL = C^0, \C\ = c > 1, then 
E[det(W IxJ ) det(l^xL)] = E[det(W>c x Jc ) det(W^ xLC )] • E[det(W C xc) 2 }, 
where A c = A \ C for any subset AQ [r], and 

W = W [r] c x[r] c - W [r] c xC Wc* c W Cx [r}c ~ W r - C (n - c, I r - C ). 

Proof. The claim follows from the fact that 

det{W IxJ ) det(iy^ xL ) = det(Ty C xC7) 2 det(^ /c>< Jc ) det{W K c xL c) 

in conjunction with the independence of WcxC and W (see Lemma 5.2 
below) . □ 

Since Theorem 4.4 yields the term E[det(Wcxc) 2 ] appearing in Lemma 4.7, 
the proof of Theorem 4.5 is completed by the following lemma, which is 
proven in the Appendix. 

Lemma 4.8. Let I, J,K,L£ {^} such that I A J = K A L and I D J D 
K n L = 0. Define I,J,K,L as in Theorem 4-5, and assume furthermore 
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that inJ<i<J<KnL, InR <inL, and JnK <JnL. ifw~ 

W r (n,I r ), then 

771(7) — 777 -1- c I ' 

E[det(W> XJ )det(WkxL)] = ^j7 ^ -^(m-c-p)!, 

[(re — m)\y 

where c= \IC\J\ = \KP\L\ and p= \InK\ = \J <T\L\. 

5. Variances of minors. In Sections 3-4 we found the covariance ma- 
trix of the compound S( m ) of a Wishart matrix S ~ W r (n,T>). However, 
due to the involved square roots X 1 ' 2 , the form of the individual entries of 
Cov[S^ m '] is not transparent. In this section, we derive explicit formulas for 
the variances of m x m-minors with m <n. 

We begin by reviewing the well-known formula for a principal minor [2], 
Section 7.5. 

Proposition 5.1. // S ~>V r (n,£) andle{ r m ), then 

Var det 5 /x/ = -A - \ ' - rr det(E Jx j) 2 . 

(n — my. { (n + 2 — my. [n — my. ) 

Proof. Apply (3.1) with the submatrix Si x i replacing the full Wishart 
matrix Si x i to obtain that Var [det (5/ x /)] = det(S/ x /) 2 • Var[det(W/ x /)], 
which in conjunction with Theorem 4.4 yields the claim. □ 

Next, we derive an explicit formula for the variance of off-diagonal minors 
of a general Wishart matrix S ~ W r (n, E). From this formula and Proposi- 
tion 5.1, a formula for the variance of arbitrary minors of S is obtained in 
Theorem 5.7. 

Let I,j£ {^} be two disjoint subsets. Then the minor det(5/ x j) is off- 
diagonal in that it does not involve any diagonal elements of S. Let Sij x ij 
and T,jj x jj be the (IU J) x (/U J)-submatrix of S and E, respectively. We 
partition these 2m x 2m-submatrices into four m x m-submatrices according 
to / and J where we adopt the shorthand notation Si x i = Sh, Si x j = Su, 
etc. Let 

Sn.j = Su — SjjSJjSji and E//.j = £77 — E/jEjjEj/. 

Our line of attack in computing the variance of the off-diagonal minor 
det(Sjxj) = det(5/j) is to employ the decomposition 

(5.1) Var[det(,S / j)] = Var[E[det(S 7J ) | Sjj}} + E[Var[det(S 7J ) | Sjj}}. 

The evaluations of the two terms on the right-hand side of (5.1) are given 
in Lemmas 5.3 and 5.4, which are based on the following well-known result 
[12], Theorem 3.2.10. 
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Lemma 5.2. IfS~ W r (n,Y,) andm < n then Sjj ~ W m {n,T>jj), Su.j ~ 
y^m{n — m^ S//.j), and the random, matrix Su.j is independent of (Su, Sjj). 
Finally, the conditional distribution of Sjj given Sjj is normal and such that 

(5.2) (SuS]j /2 | Sjj) ~ M m 2 (S/jSj}^ 2 , £//. j I m ) 

(5.3) <=► PjffSijSjy 2 | 5jj) ~ A/- m2 pjffZjjZ-is 1 /*,^ ® / m ). 

Lemma 5.3. It ZioZds i/iai 

Var[E[det(5 TJ ) | 5 7J ]] = r -^( , ( " + 2) \, - ^^j) ■ det(E„) a . 

(n — my. {{n + 2 — my. [n — my. ) 

PROOF. By (5.2) in Lemma 5.2, 

Var[E[det(5jj) | Sjj}}=Y a r[E[det(S IJ SJ l J /2 ) \ Sjj] • det^ 2 )] 
= Var[det(E / jSj})-det(5 JJ )] 
= det(S /J ) 2 det(S JJ )- 2 Var[det(5j J )]. 
Now the claim follows from Proposition 5.1. □ 

Lemma 5.4. Let £ denote the I x J-submatrix of the inverse ofTijjxij- 
Then 

E[Var[det(5 7J ) | Sjj}] 
= det(E/j x /j) 

Proof. First note that 

E[Var[det(S 7J ) | Sjj]} 
(5.4) 

= det(E /r .j) • E[Var[det(S7/ / J 2 S /J S7] /2 ) | Sjj] ■ det(Sjj)}. 
It follows from (5.3) that conditional on Sjj, the entries of the matrix 

— 1/2 — 1/2 

Sjj j SijSjj are independent normal random variables with variance 1, 
albeit these entries are not identically distributed as their means may differ 
in arbitrary fashion. We are led to the problem of computing Var[det(X)], 
where the matrix X £ fl£ mxm [ s distributed according to the multivariate 
normal distribution 

X ~ M m 2 (A, I m ®I m ), A = EjlfEuEjjS 1 / 2 e R mxm . 



14 



M. DRTON, H. MASSAM AND I. OLKIN 



Lemma A.l provides an evaluation of E[det(X) 2 ], and from (5.4) we find 
that 



(5.5) 



E[Var[det(S 7J ) | Sjj}} 
= det(S//.j) 

m— 1 



x V^ (m — k)\ 

k=0 

x E tvi^J^jj^Sj^^j^j 2 ^} ■ det(Sjj) 
Setting C = T*^ j 'EjjY 1 J i jT,jjT>~jj, (5.5) simplifies to 

m— 1 

E[Var[det(5 7J ) | Sjj]] = det(Sjj.j) £ (m - fc)! • E[tr{ (CSjj)^} • det(Sjj)] 

fc=0 
m— 1 

= det(Sjj.j) E ( m - *0 ! ' E[tr{C^^J ' det(Sjj)}] 

fc=0 
m— 1 

= det(Ejj.j) 2 (m - k)\ ■ tr{C^E[sfj ■ det(Sjj)}}. 

k=0 

Now, let Wjj = pjj)- 1 / 2 Sjjpj J )- 1 / 2 . As in (3.1), Wjj ~ W m (n,/ m ). 
Thus 

E[Var[det(5 7J ) | Sjj]] 

= det(Eu.j) det(Ejj) 

/m-l \ 

x ^(m-^I.trlcW^y^WE^-det^j)]^ 2 )^)} . 
\fc=0 / 

The distribution of Wjj • det(Wjj) has the invariance property that for 



GeO(m), 



G^{wf] ■ det(^ JJ ))(G T )( fc ) ~ Wf] • det(Wjj). 



(fc) 



In analogy to Proposition 3.2 and the derivation of Theorem 4.4, it holds 
that 



'(*) 



»: 



(ra + 2)! 



■I> 



Because det(S//.j)det(Sjj) = det(S/j x /j), we therefore have that 
E[Var[det(5 7J ) | Sjj}} 
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The claim now follows because, by simple considerations about the inverse 
of the partitioned matrix Sjj x jj, it holds that EjjC = — Ej/E^. □ 

Combining Lemmas 5.3 and 5.4 according to (5.1) yields the following 
proposition. 

Proposition 5.5. Let I,Je {^} be two disjoint subsets. Then the off- 
diagonal minor det(S/ x j) = det(Sjj) of the Wishart matrix S ~ W r (n, E) 
/ias variance 

n! 2 f (n + 2)! n! 



Var[det(S 7J )] = - nl • det(E 7J ) 2 ( 
(n — m)\ y 



(n + 2 — m)! (n — m)\ 



Til 

+ 7 l ^t • det(S/ Jx /j) 

(n — my. 



Cm— 1 
fc=0 



£V-*)!- 7 ^f?L-(-l)'=tr{(^E")<'>}V 



(n + 2-Jfe)! 



Corollary 5.6 ([16]). In £/ie special case m = 2 the off-diagonal minor 
det(Sjxj) = det(S/j) is known as a tetrad, and 

Var [det(5/j)] = n(n - 1) [(n + 2) det(E /7 ) det(E jj) 

-ndet(S/ Jx /j) + 3ndet(E/j) 2 ]. 

Proof. The claim follows from Proposition 5.5, and the fact that if 
m = 2, then 

tr(E j/E /J ) det(E/ JX /j) = det(S /Jx/ j) - det(Ejj) det(Ejj) + det(E 7 j) 2 . D 

Theorem 5.7. Let I, J € { r } have intersection C := ID,J of cardinality 
c =\C\ = \InJ\. DefineI=I\(IC\J), J=J\{InJ) andIJ=IuJ. Then 
the minor det(SVxj) = det(5/j) of the Wishart matrix S ~ W r (n,E) has 
variance 

n \ ( (V1 + 2V n' 1 

Vax[det(5/j)] = — -A - \„ J \. - — — -y det(E CxC ) 2 

(n — my y (n + 2 — cy (n — cy ) 

2 f (n + 2-c)! (n-c)! 



det(Ej 



1 (n + 2 — m)\ (n-m)\ 
+ det(E /Jx/J ) 
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X 


/to— c— 1 

\ k=0 


-c-k)\ 


(n + 2-c)! 
(n + 2-c-fc)! 

xtr{(EjjS 


•(-l) fc 



w/iere E = S ([r-]\C)x([r]\C) ~ S ([r]\C)xC S CxC S Cx([r]\C)' 

Proof. Define S in analogy to E. Since det(S7 x j) = det(Scxc) det(5/ x j) 
and ScxC an d <S/ X j are independent (Lemma 5.2), the claim follows from 
Propositions 5.1 and 5.5. □ 

6. Conclusion. We study first and second moments of minors of a Wishart 
matrix, relying fundamentally on the properties of compound matrices. For 
a standard Wishart matrix W, invariant under 0{r), we extended classic in- 
variance arguments due to Olkin and Rubin [13] to the case of compounds. 
This was possible because the Binet-Cauchy theorem implies that the dis- 
tribution of the compound W^ m ' is invariant under compounds of matrices 
in 0(r). Note, however, that the distribution of W^ m ' is not invariant under 
all matrices in 0(( r )). 

Our results yield closed-form test statistics that are useful for evaluating 
the goodness of fit of hidden variable models; compare [6], Section 3. As an 
example, consider the model from Section 2 that is induced by the graph G3 
in Figure 1. For illustration we use classic data on physical variables for 305 
fifteen- year-old girls from the University of Chicago Lab schools; a correla- 
tion matrix is reported in [9], Table 7.1, page 169. We choose X\, . . . ,Xq as 
Height, Arm span, Length of forearm, Weight, Chest girth and Chest width. 
The partition in I = {X\, X2, X%} and J = {X<±, X§, Xq} thus separates vari- 
ables relating to lankiness from those relating to stockiness. We compute 
the I x J minor of the sample correlation matrix (recall Lemma 2.3) and 
estimate its sampling variance by inserting the correlation matrix into the 
formula from Proposition 5.5. When doing this we omit the first term in the 
formula because det(Ejj) is hypothesized to be zero. Comparing the ratio 
of sample minor and estimated standard deviation to the standard normal 
distribution gives a p-value of 0.42. In comparison, the likelihood ratio test 
computed using the EM algorithm has p-value 0.39, which also indicates a 
good model fit. Repeating the same procedure for a less meaningful variable 
partition obtained by exchanging Length of forearm (X3) and Chest width 
(Xq) leads to p- values of 0.0034 and 0.0026 for the minor and the likelihood 
ratio test, respectively. These results suggest that the closed- form minor test 
may indeed have good power. 

In the above example, the only data available were a sample correlation 
matrix, which we treated as if it were a sample covariance matrix. This is 
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justified, however, because the ratio of sample minor and standard devia- 
tion estimate is the same when evaluated over the sample correlation matrix 
instead of the sample covariance matrix. This fact is a consequence of the 
multilinearity of the determinant and the Binet-Cauchy theorem, which im- 
plies that Var s [det(S7 x j)] = (Uiei^u)(IljeJ a jj) Var R [det(Sj x j)]. Here, it! 
is the correlation matrix of the covariance matrix E. While we can justi- 
fiably compute standardized sample minors from correlation matrices, our 
Wishart distribution results do not yield the moments of minors of sample 
correlation matrices. The determination of these is an interesting problem 
for future research. The distribution of sample correlation matrices is orthog- 
onally invariant when the covariance matrix is a multiple of the identity but 
it is not so in general. 

Our data example falls into a traditional large sample setting. We believe 
that minors may also be useful for high-dimensional settings in which the 
number of variables is large, perhaps even larger than the sample size. The 
reasoning behind this speculation is that sample minors may be formed 
from full rank submatrices even when the entire sample covariance matrix 
is singular. Clearly a likelihood ratio test against a saturated alternative is 
impossible under such singularity. 

APPENDIX 

A.l. Proof of Lemma 2.3. Let K[cr] be the ring of polynomials in the 
indeterminates aij, i < j. Define X\ C M.[a] to be the ideal generated by the 
minor det(E/ x j)- Since this minor is irreducible, X\ is a prime ideal. Define 
22 C M.[a] to be the ideal of all polynomials that vanish when evaluated 
at a matrix E 6 C m . The ideal X2 is also a prime ideal [5], Section 4.5. In 
Lemma 2.3, we claim that X\ =12- 

Let V\ and V2 be the irreducible varieties of complex matrices E such 
that /(E) = for all / € X\ and all / € X2, respectively. In all distributions 
in the graphical model induced by the graph G m defined in Section 2 it 
holds that Xj JL Xj | Yj m _ii. Hence, by Proposition 2.2(h), det(E/ x j) = 
for all E £ C m , which implies that X\ C X2 and V2 C V\ . Conversely, a matrix 
E € V\ can be written as E = Q + AA T with G c 2mx2m block-diagonal and 
A G C 2mx ' m_ ". A polynomial in X2 must vanish at such a matrix E. Thus 
E £ V2, and consequently V\ = V2. Since X\ is a prime ideal it now follows 
from the Strong Nullstellensatz [5], Section 4.2, that X\ =2^. 

A.2. Proof of Lemma 4.8. First, we emphasize that I C\ J = 0, K PiL = 
0, IUJ=_IAJ = KAL = KUL, and \I\ = | J\ = \K\ = \l\ =_ m - c. Defin- 
ing q = \I n L\ = I J n K\, it also holds that p + q = \I n K\ + \I n L\ = 
\I\=m — c. Moreover, since |Jn K\ + | Jfl K\ = \K\ = m — c, it holds that 

|7nA > |=p=|JnZ| and \I f] L\ = q = m - c - p = \J n K\. 
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By permuting the indices in [r] if necessary (Proposition 3.4), we can 
assume that 

ir\J = {i,...,c}, 
Ir\K = {c+i,...,c+p}, 

InL={c + p+l,...,m = c + p + q}, 
JDK = {m + l,...,m + q + l}, 

J n L = {m + q + l,..., 2m — c = m + q + p}, 

KC\L = {2m - c + 1, . . . , 2m}. 

As another convention, we enumerate the elements of the sets K and L as 
K = (fei, . . . , k m ) and L = (l\, . . . , £ m ), respectively, while choosing ki = £i 
for all i £ [c] . 

Let W = TT T be the Choleski-decomposition of W whose Choleski-factor 
T = (tij) is lower-triangular with positive diagonal elements. By Lemma 4.3, 



(A.l) 



det(W; 



IxJ) 



n*« ( n uAdetiTjxj). 



\i=l 



\i=c+l 



Whereas det(Wi x j) has the simple representation in (A.l), this is not the 
case for det(WKxLJ- However, because we are interested in E[det(W/ x j) det(Wxxi)] 
some simplification is possible based on the following fact. Because tij, i> j, 
are independent A/"(0, 1), if («« | 1 < j < i < r) contains an entry a™ that is 
odd and such that i > j, then 



(A.2) 



E 



■i>3 



urn 



0. 



*>i 



By the Binet-Cauchy theorem, det(WKxL,) is equal to 

m 

£ det(T Xx „)det(T Lx „) = £ E E ("^ fl *W^M*«.*rM 

where if = {h\, . . . , /i m }. Since k a = £ a for a£[c], 



o=l 



(A.3) 



11 *fca/l„(a)^o/l T (o) _ 11 tk a h a{a) tk a K {a) 



o=l 



We claim that 

/i= E E E (-ir +T 



(A.4) 



#e{'"}.CT€<Sm rG5 m : 

LmJ r(a)=<r(a)VaG[c] 



K | II f L/x CT(a) 



y.o=l 



11 ^k b K(byhh T ( 

\b=c+l 
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satisfies 

E[det(W IxJ ) det(W KxL )} = E[det(W IxJ ) • /J. 

In order to see this, fix H and a, and assume that r is such that there exists 
a G [c] for which a(a) ^ r(a). Then h a r a ) ^ k a or /i T ( a ) ^ k a - Without loss of 
generality, assume that h a ^ ^ k a . If h a t a ) > k a , then tk a h aM = because T 
is lower-triangular. If h a i a \ < k a , then tk a h a r a \ appears with exponent 1 in 
the monomial Y\ 1 ^ = itk a h a , a) ii a h T , a) - The index k a G K n L is not an element 
of / U J. Thus tk a h a(a) appears with exponent 1 in 

m 
det{W IxJ ) • J] t kaK{a) ti aK{a) . 

a=l 

Therefore, according to (A.2), only monomials UT=i t kah r7{a) te a K {a) appear- 
ing in /1 may contribute to the expected value of det(W/ x j)det(WxxL)- 
We can rewrite (A. 4) as 

tf G {^}creS m \a=l / \6=c+l / 

m 

E ( _1 ) T II ^ T (*>) 

re5 m : b=c+l 

r(a)=o-(a)VaG[c] 

Now, 

E (-^M IT ^fcrwl 

rS5 m : \fe=c+l / 

r(a)=o-(a)VaG[c] 



(-!) CT E c-ir*" n ^^ 

Te5 m : 
r(a)=<r(a)Vae[c] 



(<K&)) 

S5 m : b=c+l 



(-i) ct e (-!)' n ^ (CTW) 

Te5 ff ({ c+ i ] .., jra }) b=c+l 

(-l) CT det(T Zx(h(T(c+i)) ^ hCT(m)) ). 



Therefore, 



c \ / m 



h= E E II f U(J II ^6^(5) det ( r ix(h CT(c+1) ,...,h CT(m) ))- 

# G {^}creS m \a=l / \b=e+l / 
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We have thus shown that E[det(Wj x j)det(WKxL)] is equal to the expecta- 
tion of 

e e n t2 kaK {a) n tkbh^) 

// e /'}o-e5 m \a=l / \b=c+l J 

(A.5) 

x det(T Lx(K{c+1)t ... Mm)) ) ( f[ t 2 iaia ) ( fl U b i b ) det(T Jx/ ). 

\a=l / \6=c+l / 

We next claim that the expectation of (A.5) does not change when drop- 
ping all terms associated with pairs (H, a) for which {/lo-fc+i) > • • • > ^a(m) } ¥" I- 
To see this, choose b € {c + 1, . . . , m} for which h a ^ G {^^(c+i) , • • • , ^ CT (m) } \ 
I. Now consider three cases. First, if h a ^ S (K n L) U (J fl L), then /i CT / 6 ) > 
kb E Ji~, and it follows that tfc 6 /i CT(b) = 0, which leads to the vanishing of the 
term associated with H and a. Second, if h a ^) € JnK, then every nonzero 
term in the expansion of det(T£ X (/j ^ )) involves an off-diagonal 

element of T that does not appear in det(Tj x /). Hence, every monomial of 
the term associated with (H, a) features an off-diagonal element of T raised 
to the power 1. Therefore, by (A. 2), the term associated with (H,a) has ex- 
pectation zero. The third case in which /j ct (6) G I H J is similar to the second 
case just discussed. 

The claim just verified allows us to rewrite (A.5) as 



E E (-iHn* 

H&f r \:ICH creS m , Wl 

ImJ - - 

"<r({c+l,...,m})— 1 



2 
k a h 



a-(a) 



Cc \ / m x 

n &. n ***,« 
a=l / \b=c+l s 



t ib i b \det(T LxI )det(Tj xI ), 



=c+l 



where i/ CT is the permutation of 7 = {c+1, . . . , m} that sorts /i^c+i) , • ■ • , ^ CT (m) 
in increasing order, that is, c + 1 = /i<j(^(c+i)) < • • • < /v^m)) = m. We now 
argue that the expectation of (A. 6) does not change when replacing det(T^ x j) 
by 

(A-7) I [7 t u I det(T (J - nZ)x(Jni?) ). 

Win/ / 

In fact, we find (A. 7) from the Laplace expansion along the diagonal ta, 
£ £ L (1 1, for which det(T/j n i-, x /j x ^) serves as a cofactor. Every term in 
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det(Ti x j) that does not appear in (A. 7) involves an off-diagonal entry in 
T of the form t a b with a G L n I and b G I, a> b. Such t a b, however, does 
not appear in det(Tj x j) since clearly a < min(J). Now, an appeal to (A. 2) 
closes the argument. 

Next, recall that k b G I n K if b G {c + 1, . . . ,c + p}. Hence, if b G {c + 
1, . . . , c+p) but /lo-(b) 7^ fcfe, then the term tk bt h a(b) does not appear in det(Tj x j) 
In other words, a term in (A. 6) based on (H, a) with /i^m 7^ fc& has zero ex- 
pectation by (A. 2). Combining this observation with the replacement in 
(A. 7), we define 

h= e e (-Wn*u w )( n *^«) 

//e{;}:/C// <TG5 m :h CT({c+1 ,..., m}) =7, \o=l / \6=c+p+l / 

K( c+j) =k c+j eInKVjtE\p] 

(a.8)x( n ***)fn4i.l(n **. 

\k£lDK I \a=\ ) \b=c+l 

x det(T (JnZ)x(/ni?) )det(r Jx/ ), 

which satisfies E[/ 2 ] = E[det(W/ x j) det(T^cxi)]- 

In our next simplification, we claim that if we replace det(Tj x j) in / 2 by 

(A-9) det(r ( j n £) x( j n ^))det(r(j ni ^) x( / n £)), 

then the expectation does not change. This follows from (A. 2) because every 
term in the expansion of det(Tj x j) that does not appear in (A. 9) involves 
some t a b with a G J D L and b G I fl Z, and such t a b appears neither in 
det ( T (jnL)x(JnA')) nor in n^c+p+i *fcih„ w because k b G #< min(JnZ). 
In / 2 , -ff G {^} is such that / C H and fo CT ({c+l,...,m}) = / and therefore 

(A.10) iJ = {/ti,/i 2 ,...,/t c }UJ. 




Using (A.9) and the fact that h a (p) £l\(InK)=InL if b <E {c + p 
1, . . . ,m} = I f) L, we obtain that E[det(W/ x j)det(VFxxL)] is equal to 

E E - E 

■/iie[fcl]Yf>i2€[fc 2 ]\(JU{M) /i c G[A: c ]\(/U{/ii,. 




(A.11) x y, (-1)" n WW 

\/ieS/ n ^ 6=c+p+i 

x det(T(j n£ ) x( -j n ^)) det(T ( j n ^ x (j n£ )). 

In the simplification from (A. 8) to (A. 11) we replaced the two sums over 
H and a by the sums over h\,...,h c . This is possible because of (A. 10) 
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and because by choosing appropriate H and a, h a t a \ can take on any value 
in [k a ] \ I while respecting that all h^^, a € [c], must be different. In the 
simplification from (A. 8) to (A. 11) we also replaced the permutation v a 
by a new permutation /i. For this step, recall that v a in (A. 8) is the per- 
mutation that brings /i,j(c+i) , • • • , h a ^ in increasing order with /i CT ( c+ i) = 
k c+1 < h a{c+2) = k c+2 <■■■< K {c+p) = k c+p , which implies that v ff (j) = j 
for all j G {c + 1, . . • , c + p}. Thus the sign of v a is equal to the sign of 
v a \{c+p+i,...,m}- The latter restriction is denoted by fi in (A. 11). 
Noting that k\, G J fl K if b G {c + p + 1, . . . , m}, we see that 

m 

Yl ( _1 ) A ' II t k b fi(b)=de t (T(JnK)x(inl))- 

Ate5 /n£ b=c+p+l 

Thus, we have shown that E[det(T4 / / x j) det (Wj^xl)] is equal to the expec- 
tation of 



(A.12) ]]_tl det(T {LnI)x{ j nR) ) 2 det(T {LnJ)x{InR) ) 



Kiel 



■ C Aka]\I N 

11 ( /Z t kah 

.a=l \ h=a J 



Since t\ ~ Xn-i+i^ an d moreover, 



[ka]\I 

E.2 2 _ 2 _ 2 

l k a h ~ X( n „fc a +i)+(fc a _ a )_|/| — X( n -a+l)-(m-c) — An-m+c-a+1) 



h=a 



this proof can be completed using the results on expected values from the 
proof of Theorem 4.4. 



A.3. A noncentral Wishart determinant. As in Lemma 5.4, we consider 

1. 1 1 



X E R mxm distributed according to M m 2(A,I m <g> I m ), A = ( y) G M mxm 



From the independence of the entries of A, it follows that 

(A.13) E[det(A)]=det(A). 

If A is nonzero, then the matrix XX T follows a noncentral Wishart dis- 
tribution. Theorem 10.3.7 in [12] provides a general formula for moments of 
the determinant of a noncentral Wishart matrix in terms of hypergeometric 
functions with matrix argument. Here, A is a square matrix and we can give 
a simple formula E[det(A) 2 ] = E[det(AA T )] that involves only traces and 
compounds. 



Lemma A.l. The expectation ofdet(XX ) = det(A) can be expressed 



as 



E[det(A) 2 ] = J2(m-k)\- tr[(AA 

k=0 



T\(kh 



MOMENTS OF MINORS OF WISHART MATRICES 23 

Here, (AA T )^ :=leH and (AA T )^ = det(AA T ). By (A. 13), 

m—1 

Var[det(X)] = J2(m-k)\- tr[(AA T )(% 

k=0 

Proof. Let S m be the group of permutations of [m]. Then, 

m 

E[det(X) 2 ] = y, E Unx^jXruy] 

aeSm reSm 3=1 

m 

= E E (- l Y +T X[( 6 <r{i)T{j)+ a v{j)j a T(j)j), 
creSm reS m j=l 

where 6ij is the Kronecker delta. The product 

IK S <t(j)t(j) + a *(J)J a T(j)j) = E ( II a °{J)J a T(j)j ) " ( II 5 *U)t(j) ■ 

3=1 JC[m] \jeJ I \j&J / 

Therefore, if we define gj(a) =£ reSm (-l) CT+r Uj&j a v(j)j a r(j)j, then 

T(j)=<?(jm?j 

m 

E[det(X) 2 ] = £ J2 E 9j{o)- 
k=o Je ^jaes m 

Note that the permutations appearing in the definition of gj(cr) satisfy 
r(J) = a(J). 

Let <7i, o"2 £ S m be two permutations such that a±(j) = 02(3) for all j € J. 
Moreover, let T\,Ti € <S m satisfy T\(j) = T2C7) for all j € J, Ti(j) = a\{j) for 
all j ^ J, and T2(j) = 02O) for ah j ^ >/• Then it holds for the permutation 
signs that (-1) CT1 (-1) T1 = (-1) CT2 (-1) T2 . This implies that gj(a{) = gj(cr 2 ). 
We obtain 

E[det(X) 2 ] 
(A.14) 

m k 

= E("»-*)' E E E £ (-i) w n <w>«w fc - 

By the Binet-Cauchy theorem, 

E[det(X) 2 ] = £(m-fc)! £ £ det(A 7J ) 2 

m 

= ^2(m-k)\ Yl det ( A ix{m]Aj x[m] ) 
k=o Ie{7} 
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Y,(m-k)l-tr[(AA T )M]. 

fc=0 



□ 
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